1 Map of sampling stations in the Okinawa Trough

Load required package:

library(marmap)

Import bathymetry data from NOAA:

b = getNOAA.bathy(lon1 = 115, lon2 = 140, lat1 = 35, lat2 = 20, resolution = 1)

Import dataframe with the latitude and longitude of sampling points from CTD (in decimal degrees):

pts <- read.csv("Mirai_CTD_lat_long.csv")
pts$Station<- as.character(pts$Station)
str(pts)
## 'data.frame':    14 obs. of  3 variables:
##  $ Lon    : num  126 128 127 126 125 ...
##  $ Lat    : num  26.3 25.4 25.9 26.5 25.9 ...
##  $ Station: chr  "2" "3" "4" "5" ...

Make dataframe with map labels and their lat/long:

Clon = c(127.5,131.25, 121.6, 121.03, 120.8, 126)
Clat= c(26, 31.93, 31.1, 23, 27.2, 22)
CLabels= c("Okinawa", "Japan", "China", "Taiwan", "East China Sea", "Philippine Sea")
Countries = cbind.data.frame(Clon, Clat, CLabels)
str(Countries)
## 'data.frame':    6 obs. of  3 variables:
##  $ Clon   : num  128 131 122 121 121 ...
##  $ Clat   : num  26 31.9 31.1 23 27.2 ...
##  $ CLabels: Factor w/ 6 levels "China","East China Sea",..: 4 3 1 6 2 5

Make a dataframe with vent sites:

vents <- read.csv("VentSites.csv")
str(vents)
## 'data.frame':    25 obs. of  3 variables:
##  $ Long: num  122 122 123 123 123 ...
##  $ Lat : num  24.8 24.8 25.1 24.8 24.9 ...
##  $ Name: Factor w/ 25 levels "Akuseki-jima",..: 11 12 21 24 4 5 20 22 23 8 ...

Make plot one layer at a time:

#tiff("MiraiFiltersMap.tiff", height= 8, width =8, units = 'in', res=300) # uncomment to save plot to file
plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = list(c(0, max(b), grey(.7), grey(.9), grey(.95)),
c(min(b), 0, "darkblue", "lightblue"))) 
#plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = Newblues(100), cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
#bathymetry colors
plot(b, lwd = 0.8, deep = 0, shallow = 0, step = 0, add = TRUE) # highlight coastline with solid black line
plot(b, deep=-200, shallow=-200, lwd = 0.4, drawlabels=T, add=T)  # Add -200m isobath
plot(b, deep=-1000, shallow=-1000, lwd = 0.4, drawlabels=T, add=T)  # Add -1000m isobath
plot(b, deep=-2000, shallow=-2000, lwd = 0.4, drawlabels=T, add=T)  # Add -2000m isobath
plot(b, deep=-3000, shallow=-3000, lwd = 0.4, drawlabels=T, add=T)  # Add -3000m isobath
points(pts, pch = 16, col = 'red') # Add sampling points
text(pts, labels=pts$Station, adj= 1.5, cex=1.2, size = 4) # Add sampling station numbers
text(Countries, labels=Countries$CLabels, cex= 1.7, adj = -0.2) # Add map labels 
points(vents, pch = 17, col = 'blue', size = 3)
#dev.off() #uncomment to save plot to file

2 Sequence analysis

2.1 Identify and Count Amplicon Sequence Variants (ASVs)

Sample collection, DNA extraction, PCR, & sequencing

Seawater collected from the subsurface chlorophyll maximum (SCM), mid-water column, and near-bottom was collected in 10 L Niskin bottles at each sampling station and surface seawater was collected by bucket. Five Liter replicates from each depth (4.5 from surface) were sequentially filtered through 10.0 and 0.2 um pore-size Polytetrafluoroethylene (PTFE) filters. Filters were flash frozen in liquid nitrogen and stored at -80 C until DNA extraction. DNA was extracted following the manufacturers protocols for the Qiagen/MoBio DNeasy PowerWater Kit, including the optional heating step to lyse cells. Amplicon PCR and sequencing library preparation followed the procedures described in the Illumina 16S Metagenomic Sequencing Library Preparation manual modified to include universal eukaryotic primers for the v4 region of the 18S rRNA gene (F: Stoeck et al. 2010, R: Brisbin et al. 2018) and amplicon PCR conditions most appropriate for these primers (annealing at 58C). The 220 samples were separated into 4 sequencing runs on the Illumina MiSeq sequencing platform with v3 chemistry for 2x300-bp sequencing.

Bioinformatic processing with QIIME 2

Demultiplexed paired-end sequences provided by the OIST sequencing center were imported to QIIME 2 (v2018.11) software (Bolyen et al. 2019). The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for Qiime2 for quality filtering, chimera removal, and feature table construction (Callahan et al. 2015) separately for each sequencing run before merging the four resulting feature tables. Taxonomy was assigned to Amplicon Sequence Variants (ASVs) in the feature table by training a classifier on the Protist Ribosomal Reference (PR2) database to classify ASVs (Guillo et al. 2012). The feature table and assigned taxonomy were then exported from Qiime2 for further analysis.

Below is an example Qiime2 workflow for 1 sequencing run:

Activate an anaconda environment for qiime2: source activate qiime2-2018.11

Import sequencing data (all sequences for eacg sequencing run must be in their own directory):

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./seqs/seqrun1/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path run1_demux-paired-end.qza

Visualize squencing results (use these graphs to decide what length to trim sequences)

qiime demux summarize --i-data run1_demux-paired-end.qza --o-visualization run1_demux.qzv

Run the DADA2 algorithm:

qiime dada2 denoise-paired --i-demultiplexed-seqs run1_demux-paired-end.qza --output-dir ./dd2run1 --o-representative-sequences run1_rep-seqs --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 260 --p-trunc-len-r 250 --p-n-threads 3

Check results:

qiime feature-table summarize --i-table ./dd2run1/table.qza --o-visualization ./dd2run1/table.qzv --m-sample-metadata-file ~/desktop/MiraiFilters/sampleMaptxt.txt 
qiime metadata tabulate --m-input-file dd2run1/denoising_stats.qza --o-visualization dd2run1/denoising_stats.qzv

Merge feature tables from multiple sequencing runs:

qiime feature-table merge --i-tables ./dd2run1/table.qza --i-tables ./dd2run2/table.qza --i-tables ./dd2run3/table.qza --i-tables ./dd2run4/table.qza --o-merged-table mergedtable.qza
qiime feature-table merge-seqs --i-data run1_rep-seqs.qza --i-data run2_rep-seqs.qza --i-data run3_rep-seqs.qza --i-data run4_rep-seqs.qza --o-merged-data merged-rep-seqs.qza

Import taxonomy database:

qiime tools import --type 'FeatureData[Sequence]' --input-path pr2_version_4.11.1_mothur.fasta --output-path pr2.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path pr2_version_4.11.1_mothur.tax -output-path pr2_tax.qza

Select V4 region from PR2 sequences:

qiime feature-classifier extract-reads --i-sequences pr2.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads pr2_v4.qza

Train the classifier:

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads pr2_v4.qza --i-reference-taxonomy pr2_tax.qza --o-classifier pr2_v4_classifier.qza

Classify:

qiime feature-classifier classify-sklearn --i-classifier pr2_v4_classifier.qza --i-reads merged-rep-seqs.qza --o-classification taxonomy.qza
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

Export the .tsv taxonomy file from taxonomy.qzv for use in following steps.

2.2 Load data into R & prepare phyloseq objects

Load packages:

library(tidyverse)
library(qiime2R)
library('phyloseq')
library('vegan')
library('ggplot2')
library(RColorBrewer)
library(DESeq2)
library(reshape)
library("wesanderson")
library("gridExtra")
library("metagMisc")
library("wesanderson")
library("breakaway")
library("CoDaSeq")
library("ggbiplot")
library(ggpubr)

Set default colors:

ap<- c("#F1B6A1", "#D4A52A", "#E3E5DB", "#A5CFCC", "#0E899F", "#A83860", "#ED91BC", "#DB5339", "#F58851", "#42465C", "#1E479A", "#F7CDA4", "#CF529C", "#11638C")

Convert QIIME 2 artifacts to phyloseq objects:

phyloseq<-qza_to_phyloseq(features="mergedtable.qza")

Import sample data:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <- as.character(metatable$filter)
metatable$VentANAHTM<-as.factor(metatable$VentANAHTM)

META<- sample_data(metatable)
str(META)
## 'data.frame':    219 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : Factor w/ 219 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : chr  "2" "2" "2" "2" ...
##   .. ..$ : Factor w/ 4 levels "Bottom","Mid",..: 4 4 4 4 1 1 2 2 3 3 ...
##   .. ..$ : int  0 0 0 0 1000 1000 700 700 92 92 ...
##   .. ..$ : chr  "10" "2" "10" "2" ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   ..@ names    : chr  "SampleID" "Station" "Depth" "m" ...
##   ..@ row.names: chr  "A8-F17-DNA" "B8-F18-DNA" "C8-F19-DNA" "D8-F20-DNA" ...
##   ..@ .S3Class : chr "data.frame"

Load PR2 taxonomy:

taxonomy <- read.csv("taxonomy.csv", stringsAsFactors = FALSE)
names(taxonomy) <- c("row", "tax", "Confidence")
row.names(taxonomy) <-taxonomy[[1]]
taxonomy <- taxonomy[,(-1)]
taxonomy <-  separate(taxonomy, tax, c("D0","D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "D10", "D11", "D12", "D13", "D14"), sep = ";", fill = "right")
taxonomy <- taxonomy[,c(1:8)]
taxmat <- as.matrix(taxonomy)
TAX = tax_table(taxmat)
str(TAX)
## Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
##   ..@ .Data: chr [1:23115, 1:8] "Eukaryota" "Eukaryota" "Eukaryota" "Eukaryota" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:23115] "000a220fe8edd771dd82f6915627ef3e" "000c79760dd16d6692018deb8ea1b164" "000d9eaa21eaff08f34fb1a1a87601d5" "000e6f109e01181b2b1eda5d6c99319c" ...
##   .. .. ..$ : chr [1:8] "D0" "D1" "D2" "D3" ...

Add taxonomy to phyloseq object:

ps = merge_phyloseq(phyloseq, TAX, META)

2.3 Prevalence Plots

Prevalence plots display the total abundance of an ASV in the full data set v. the fraction of total samples that ASV is found in.

prevdf = apply(X = otu_table(ps),
               MARGIN = ifelse(taxa_are_rows(ps), yes = 1, no = 2),
               FUN = function(x){sum(x > 0)})

prevdf = data.frame(Prevalence = prevdf,
                    TotalAbundance = taxa_sums(ps),
                    tax_table(ps))

#plyr::ddply(prevdf, "D2", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})

Prevalence plot:

prevplot1<-ggplot(prevdf, aes(TotalAbundance, Prevalence / nsamples(ps),color=D2)) +  geom_point(size = 2, alpha = 0.7) + 
  theme_bw()+
  scale_x_log10() +  xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
  facet_wrap(~D1) + theme(legend.position="none")

prevplot1

We don’t apply prevalence filter because we are interested in rare ASVs in deep water masses and at hydrothermal vents.

2.4 Basic statistics

OTUs <- data.frame(otu_table(ps))

Total number of ASVs in the data set:

OTUsRS<- OTUs
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 22656

Total number of ASVs per sample (range and mean):

OTUs0 <- OTUs!=0 #is this number not a zero? true (1) or false (0)
csums <- colSums(OTUs0) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1906
## [1] 1906
min(csumdf$csums) #49
## [1] 49
mean(csumdf$csums) #730
## [1] 729.9194

2.5 Alpha Diversity

Breakaway Modeled Richness for all samples by depth layer:

ba <- breakaway(ps)

badf<- summary(ba) %>%
  add_column("SampleID" = ps %>% otu_table %>% sample_names)

badf<- merge(badf, metatable, by = "SampleID")

baPlot <- ggplot(badf, aes(x=Depth, y=estimate, fill=filter)) + geom_boxplot() + theme_bw() + scale_fill_manual(values = c("#E3E5DB", "#11638C"), labels=c("10.0", "0.2")) + theme(text = element_text(size=14)) +ylab("Richness Estimate")

baPlot

ggsave("breakaway_4depths.tiff", width = 6, height = 4)

Breakaway alpha diversity significance testing for all samples by depth layer:

bt <- betta(summary(ba)$estimate,
            summary(ba)$error,
            make_design_matrix(ps, "Depth"))
bt$table
##                    Estimates Standard Errors p-values
## (Intercept)       645.865450        19.47771    0.000
## predictorsMid     -31.766181        39.48228    0.421
## predictorsSCM     379.222118        39.56365    0.000
## predictorsSurface  -1.443341        38.09437    0.970

The modeled richness in the SCM is significantly higher than all other sample types, which are not significantly different from each other.

Breakaway Modeled Richness for all bottom water samples by sampling station:

bottomraw<- subset_samples(ps, Depth=="Bottom")
bba <- breakaway(bottomraw)

bbadf<- summary(bba) %>%
  add_column("SampleID" = bottomraw %>% otu_table %>% sample_names)

bbadf<- merge(bbadf, metatable, by = "SampleID")
bbadf$Station_f <- factor(bbadf$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
bbaPlot <- ggplot(bbadf, aes(x=Station_f, y=estimate, fill=filter)) + geom_boxplot() + theme_bw() + scale_fill_manual(values = c("#E3E5DB", "#11638C"), labels=c("10.0", "0.2")) + theme(text = element_text(size=14)) +ylab("Richness Estimate") +xlab("Station")

bbaPlot

ggsave("breakaway_bottom.tiff", width = 6, height = 4)

Breakaway alpha diversity significance testing for bottom samples by vent (stations 2, 10), plume (station 11), or no hydrothermal activity (all other stations):

bt <- betta(summary(bba)$estimate,
            summary(bba)$error,
            make_design_matrix(bottomraw, "VentPlume"))
bt$table
##             Estimates Standard Errors p-values
## (Intercept)  633.6996        28.73865    0.000
## predictorsp -112.9082       104.57203    0.280
## predictorsy  137.8603        73.94473    0.062

Vent/Plume sites are not more or less diverse than sites without hydrothermal influence.

2.6 Taxonomic Filtering

  • Remove ASV’s without a D1 (“Kingdom”) taxonomy assignment
  • Remove ASV’s assigned as Metazoa at D2
#table(tax_table(psN)[, "D1"], exclude = NULL)
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
OTUs <- data.frame(otu_table(ps))

Check ASV richness to see how much it changes after filtering:

OTUs0 <- OTUs!=0
csums <- colSums(OTUs0)
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1800
## [1] 1800
min(csumdf$csums) #45
## [1] 45
mean(csumdf$csums) #685
## [1] 684.3128

2.7 Distance and Ordination

2.7.1 All Samples

2.7.1.1 Atchison distance (CLR + Euclidean)

  • compute CLR normalization, CLR = centered log-ratio, log(x/gx) where gx is the geomentric mean of vector x
  • Then use Euclidean distance
  • PCoA and/or PCA
  • PERMANOVA
OTU4clr<- data.frame(t(data.frame(otu_table(ps))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)

row.names(metatable) <- gsub("-", "", row.names(metatable))
META2<- sample_data(metatable)

psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c("10.0", "0.2")) +geom_point(size=3) +theme(text = element_text(size=16)) 
p

#ggsave("pcoa_allsamples.tiff", width = 6, height = 4)

2.7.2 Ordination for sample subgroups (Depth and Depth/Filter-type)

The strongest determinant of clustering seems to be depth. To better observe clustering within depth groups, subset by depth and then filter type.

2.7.2.1 Surface

physeqSurf<- subset_samples(psCLR, Depth == "Surface")
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf<-plot_ordination(physeqSurf, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values= c(ap[4]))+ geom_point(size=3) +ggtitle("Surface") +theme(text = element_text(size=16)) + guides(color = FALSE) + scale_shape_discrete(labels=c("10.0", "0.2"))
pSurf

Samples cluster by filter type.

2.7.2.2 Surface 10.0

PCA

physeqSurf<- subset_samples(psCLR, Depth == "Surface" & filter == "10")
OTUsSurf10 <- data.frame(otu_table(physeqSurf))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSurf10),]
Surface_10<- prcomp(OTUsSurf10)
plot(Surface_10, type="lines")

PCA_Surf10<-ggbiplot(Surface_10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Surface 10.0")
PCA_Surf10

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSurf10, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSurf10, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model     R2 Pr(>F)   
## Region_SKN  2      5544  2772.1  1.7612 0.1235  0.005 **
## Residuals  25     39350  1574.0         0.8765          
## Total      27     44894                 1.0000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSurf10, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2         F   df     p  p.adj
## 1     KG.NOT 0.12504910 2.5725831 1;18 0.002 0.0060
## 2     KG.SOT 0.06560963 0.9830311 1;14 0.455 0.4550
## 3    NOT.SOT 0.08495903 1.6712504 1;18 0.007 0.0105
  • KG to North - Diff
  • KG to South - NOT diff
  • North to South - Diff

2.7.2.3 Surface 0.2

PCA

physeqSurf2<- subset_samples(psCLR, Depth == "Surface" & filter == "2")
#tiff("variance_surface2.tiff", height= 8, width =8, units = 'in', res=70) # uncomment to save plot to file
OTUsSurf2 <- data.frame(otu_table(physeqSurf2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSurf2),]
surface2 <- prcomp(OTUsSurf2)
plot(surface2, type="lines")

#dev.off()
PCA_Surf2<-ggbiplot(surface2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Surface 0.2")
PCA_Surf2

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSurf2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSurf2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## Region_SKN  2      8964  4481.9   2.673 0.18217  0.001 ***
## Residuals  24     40241  1676.7         0.81783           
## Total      26     49205                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSurf2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.17432173 3.800259 1;18 0.001 0.0015
## 2     KG.SOT 0.09142048 1.308049 1;13 0.151 0.1510
## 3    NOT.SOT 0.13974110 2.761493 1;17 0.001 0.0015
  • KG to North - Diff
  • KG to South - NOT diff
  • North to South - Diff

2.7.3 SCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM<-plot_ordination(physeqSCM, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[5]))+ geom_point(size=3) +ggtitle("SCM") +theme(text = element_text(size=16))
pSCM

Again, strong clustering by filter type.

2.7.3.1 SCM 10.0

PCA

physeqSCM10<- subset_samples(psCLR, Depth == "SCM" & filter =="10")
OTUsSCM10 <- data.frame(otu_table(physeqSCM10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSCM10),]
SCM10 <- prcomp(OTUsSCM10)
plot(SCM10, type="lines")

PCA_SCM10<-ggbiplot(SCM10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("SCM 10.0")
PCA_SCM10

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSCM10, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSCM10, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2     10012  5006.2  1.4675 0.10897  0.011 *
## Residuals  24     81871  3411.3         0.89103         
## Total      26     91883                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsSCM10, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.09150520 1.712270 1;17 0.010 0.0285
## 2     KG.SOT 0.08412546 1.194084 1;13 0.174 0.1740
## 3    NOT.SOT 0.07616598 1.484019 1;18 0.019 0.0285
  • KG and NOT = Different
  • NOT and SOT = Different
  • KG and SOT = Not Different

2.7.3.2 SCM 0.2

PCA

physeqSCM2<- subset_samples(psCLR, Depth == "SCM" & filter == "2")
OTUsSCM2 <- data.frame(otu_table(physeqSCM2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsSCM2),]
SCM2 <- prcomp(OTUsSCM2)
plot(SCM2, type="lines")

PCA_SCM2<-ggbiplot(SCM2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("SCM 0.2")
PCA_SCM2

PERMANOVA

set.seed(1)
adonis(vegdist(OTUsSCM2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsSCM2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)
## Region_SKN  2      6674  3336.8  1.1823 0.09705  0.153
## Residuals  22     62094  2822.4         0.90295       
## Total      24     68768                 1.00000

2.7.4 Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid<-plot_ordination(physeqMid, ordu, color="Depth", shape = "filter")+theme_bw() +scale_color_manual(values=ap[12])+ geom_point(size=3) +ggtitle("Mid")+theme(text = element_text(size=16))
pMid

Samples still cluster by filter pore-size, but now on the secondary axis.

2.7.4.1 Mid 10.0

PCA

physeqMid10<- subset_samples(psCLR, Depth == "Mid" & filter == "10")
OTUsMid10 <- data.frame(otu_table(physeqMid10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsMid10),]
Mid10 <- prcomp(OTUsMid10)
plot(Mid10, type="lines")

PCA_Mid10<-ggbiplot(Mid10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Mid 10.0")
PCA_Mid10

PERMANOVA by Region

adonis(vegdist(OTUsMid10, method = "euclidean") ~ Region_SKN, data = meta, permutations = 999)
## 
## Call:
## adonis(formula = vegdist(OTUsMid10, method = "euclidean") ~ Region_SKN,      data = meta, permutations = 999) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2      4931  2465.6  1.2669 0.09923  0.075 .
## Residuals  23     44763  1946.2         0.90077         
## Total      25     49694                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

not significant at our cut-off

2.7.4.2 Mid 0.2

PCA

physeqMid2<- subset_samples(psCLR, Depth == "Mid" & filter == "2")
OTUsMid2 <- data.frame(otu_table(physeqMid2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsMid2),]
Mid2 <- prcomp(OTUsMid2)
plot(Mid2, type="lines")

PCA_Mid2<-ggbiplot(Mid2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Mid 0.2")
PCA_Mid2

#ggsave("PCA_Mid2.png")

PERMANOVA by Region

set.seed(1)
adonis(vegdist(OTUsMid2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsMid2, method = "euclidean") ~ Region_SKN,      data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## Region_SKN  2      6883  3441.3  1.6878 0.13302  0.001 ***
## Residuals  22     44857  2038.9         0.86698           
## Total      24     51739                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsMid2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p  p.adj
## 1     KG.NOT 0.12483288 2.139583 1;15 0.009 0.0270
## 2     KG.SOT 0.11303007 1.401773 1;11 0.062 0.0620
## 3    NOT.SOT 0.07811164 1.525141 1;18 0.019 0.0285
  • North to South = diff
  • KG to North = diff
  • KG to South = not diff

2.7.5 Bottom

physeqBottom<- subset_samples(psCLR, Depth == "Bottom")
ordu = ordinate(physeqBottom, "PCoA", "euclidean")
pBottom<-plot_ordination(physeqBottom, ordu, color="Depth", shape = "filter")+theme_bw() +scale_color_manual(values=ap[8])+ geom_point(size=3) +ggtitle("Bottom")+theme(text = element_text(size=16))
pBottom 

No clustering by filter type.

For consistency, split by filter type for the NOT/SOT/KG biogeography:

2.7.5.1 Bottom 10.0

PCA

physeqBottom10<- subset_samples(psCLR, Depth == "Bottom" & filter == "10")
OTUsBottom10 <- data.frame(otu_table(physeqBottom10))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom10),]
Bottom10 <- prcomp(OTUsBottom10)
plot(Bottom10, type="lines")

PCA_Bottom10<-ggbiplot(Bottom10, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Bottom 10.0")
PCA_Bottom10

PERMANOVA by Region

set.seed(1)
adonis(vegdist(OTUsBottom10, method = "euclidean") ~ Region_SKN , data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom10, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## Region_SKN  2      5815  2907.4  1.2317 0.09309  0.068 .
## Residuals  24     56651  2360.4         0.90691         
## Total      26     62465                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

not significant at our cut-off

2.7.5.2 Bottom 0.2

PCA

physeqBottom2<- subset_samples(psCLR, Depth == "Bottom" & filter == "2")
OTUsBottom2 <- data.frame(otu_table(physeqBottom2))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom2),]
Bottom2 <- prcomp(OTUsBottom2)
plot(Bottom2, type="lines")

PCA_Bottom2<-ggbiplot(Bottom2, var.axes = FALSE, groups= meta$Region_SKN, ellipse=TRUE ) +theme_bw() +scale_color_manual(values=c(ap[4], ap[11], ap[2]))+ theme(text = element_text(size=14)) +  theme(legend.position="bottom") +ggtitle("Bottom 0.2")
PCA_Bottom2

PERMANOVA by region

set.seed(1)
adonis(vegdist(OTUsBottom2, method = "euclidean") ~ Region_SKN, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom2, method = "euclidean") ~      Region_SKN, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## Region_SKN  2      7125  3562.5  1.5119 0.11619  0.007 **
## Residuals  23     54197  2356.4         0.88381          
## Total      25     61322                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsBottom2, method = "euclidean"), group.var="Region_SKN")
tst$Adonis.tab
##   Comparison         R2        F   df     p p.adj
## 1     KG.NOT 0.06908014 1.187301 1;16 0.165 0.165
## 2     KG.SOT 0.09775585 1.408517 1;13 0.030 0.045
## 3    NOT.SOT 0.10567637 2.008779 1;17 0.001 0.003
  • North to South = diff
  • KG to North = not diff
  • KG to South = diff

Pattern reversed from that seen in Mid, SCM, and Surface

Supplemental Figure 1

supp1<-ggarrange(pSurf, pSCM, pMid, pBottom, common.legend = TRUE, legend = "bottom")
supp1

#ggsave("Supplemental1.png", width = 6, height = 6 )

Bottom-water by Hydrothermal Activity

For Vent v. Plume - use 10.0 and 0.2 samples together because 0.2 and 10.0 pore-size filters do not cluster separately in the PCoA and necessary because there are a low number of vent sites compared to non-vent sites

PERMANOVA

set.seed(1)
physeqBottom<- subset_samples(psCLR, Depth == "Bottom")
OTUsBottom <- data.frame(otu_table(physeqBottom))
meta <- metatable[row.names(metatable) %in% row.names(OTUsBottom),]
adonis(vegdist(OTUsBottom, method = "euclidean") ~ VentPlume, data = meta)
## 
## Call:
## adonis(formula = vegdist(OTUsBottom, method = "euclidean") ~      VentPlume, data = meta) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model     R2 Pr(>F)    
## VentPlume  2      9303  4651.3  1.9195 0.0713  0.001 ***
## Residuals 50    121160  2423.2         0.9287           
## Total     52    130462                 1.0000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise PERMANOVA

tst<-adonis_pairwise(x=meta, dd=vegdist(OTUsBottom, method = "euclidean"), group.var="VentPlume")
tst$Adonis.tab
##   Comparison         R2         F   df     p p.adj
## 1        n.p 0.03681976 1.6437730 1;43 0.012 0.018
## 2        n.y 0.04858759 2.4002384 1;47 0.001 0.003
## 3        p.y 0.08956956 0.9838155 1;10 0.429 0.429
  • Vents v. Plume = Not diff
  • Vents v. NO hydrothermal activity = Different
  • Plume v. NO hydrothermal activity = Different

PCA

Bottom <- prcomp(OTUsBottom)
plot(Bottom, type="lines")

PCA_Bottom<-ggbiplot(Bottom, var.axes = FALSE, groups= meta$VentPlume, ellipse=TRUE ) +theme_bw() +scale_color_manual(values= c(wes_palette("Cavalcanti1")[4], wes_palette("Cavalcanti1")[1], wes_palette("Cavalcanti1")[5] ), labels= c("none", "plume", "vent"))+ theme(text = element_text(size=16)) +  theme(legend.position="bottom")
PCA_Bottom

2.8 Relative Abundance Plots

Merge Samples –> collapse replicates (for Relative Abundance Plots)

Prepare metadata table:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
META<- sample_data(metatable)
ps2<- ps
sample_data(ps2) <- META
print(ps2)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19574 taxa and 211 samples ]
## sample_data() Sample Data:       [ 211 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 19574 taxa by 8 taxonomic ranks ]
str(sample_data(ps2))
## 'data.frame':    211 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : chr  "11" "10" "10" "10" ...
##   .. ..$ : chr  "Mid" "Bottom" "Mid" "SCM" ...
##   .. ..$ : int  700 1510 700 85 85 85 700 700 1510 1510 ...
##   .. ..$ : chr  "2" "2" "10" "2" ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 2 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 1 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : chr  "11-Mid-2" "10-Bottom-2" "10-Mid-10" "10-SCM-2" ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "F142" "F150" "F151" "F154" ...
##   ..@ .S3Class : chr "data.frame"

Merge:

mergedps <- merge_samples(ps2, "desc")
print(mergedps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19574 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 19574 taxa by 8 taxonomic ranks ]

number of samples reduced from 211 to 112

sample_names(mergedps)
##   [1] "10-Bottom-10"  "10-Bottom-2"   "10-Mid-10"     "10-Mid-2"     
##   [5] "10-SCM-10"     "10-SCM-2"      "10-Surface-10" "10-Surface-2" 
##   [9] "11-Bottom-10"  "11-Bottom-2"   "11-Mid-10"     "11-Mid-2"     
##  [13] "11-SCM-10"     "11-SCM-2"      "11-Surface-10" "11-Surface-2" 
##  [17] "12-Bottom-10"  "12-Bottom-2"   "12-Mid-10"     "12-Mid-2"     
##  [21] "12-SCM-10"     "12-SCM-2"      "12-Surface-10" "12-Surface-2" 
##  [25] "13-Bottom-10"  "13-Bottom-2"   "13-Mid-10"     "13-Mid-2"     
##  [29] "13-SCM-10"     "13-SCM-2"      "13-Surface-10" "13-Surface-2" 
##  [33] "14-Bottom-10"  "14-Bottom-2"   "14-Mid-10"     "14-Mid-2"     
##  [37] "14-SCM-10"     "14-SCM-2"      "14-Surface-10" "14-Surface-2" 
##  [41] "15-Bottom-10"  "15-Bottom-2"   "15-Mid-10"     "15-Mid-2"     
##  [45] "15-SCM-10"     "15-SCM-2"      "15-Surface-10" "15-Surface-2" 
##  [49] "17-Bottom-10"  "17-Bottom-2"   "17-Mid-10"     "17-Mid-2"     
##  [53] "17-SCM-10"     "17-SCM-2"      "17-Surface-10" "17-Surface-2" 
##  [57] "18-Bottom-10"  "18-Bottom-2"   "18-Mid-10"     "18-Mid-2"     
##  [61] "18-SCM-10"     "18-SCM-2"      "18-Surface-10" "18-Surface-2" 
##  [65] "2-Bottom-10"   "2-Bottom-2"    "2-Mid-10"      "2-Mid-2"      
##  [69] "2-SCM-10"      "2-SCM-2"       "2-Surface-10"  "2-Surface-2"  
##  [73] "3-Bottom-10"   "3-Bottom-2"    "3-Mid-10"      "3-Mid-2"      
##  [77] "3-SCM-10"      "3-SCM-2"       "3-Surface-10"  "3-Surface-2"  
##  [81] "4-Bottom-10"   "4-Bottom-2"    "4-Mid-10"      "4-Mid-2"      
##  [85] "4-SCM-10"      "4-SCM-2"       "4-Surface-10"  "4-Surface-2"  
##  [89] "5-Bottom-10"   "5-Bottom-2"    "5-Mid-10"      "5-Mid-2"      
##  [93] "5-SCM-10"      "5-SCM-2"       "5-Surface-10"  "5-Surface-2"  
##  [97] "8-Bottom-10"   "8-Bottom-2"    "8-Mid-10"      "8-Mid-2"      
## [101] "8-SCM-10"      "8-SCM-2"       "8-Surface-10"  "8-Surface-2"  
## [105] "9-Bottom-10"   "9-Bottom-2"    "9-Mid-10"      "9-Mid-2"      
## [109] "9-SCM-10"      "9-SCM-2"       "9-Surface-10"  "9-Surface-2"

But metatable inside phyloseq object was disturbed …

str(sample_data(mergedps))
## 'data.frame':    112 obs. of  11 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 11
##   .. ..$ : num  10 10 10 10 10 10 10 10 11 11 ...
##   .. ..$ : num  NA NA NA NA NA NA NA NA NA NA ...
##   .. ..$ : num  1510 1510 700 700 85 85 0 0 1210 1210 ...
##   .. ..$ : num  10 2 10 2 10 2 10 2 10 2 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 1 1 ...
##   .. ..$ : num  NA NA NA NA NA NA NA NA NA NA ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   ..@ .S3Class : chr "data.frame"

Fix meta table in phyloseq object:

meta2<-as.data.frame(sample_data(mergedps))
split<- do.call(rbind, strsplit(row.names(meta2), "-"))
meta2$Depth <- split[,2]
meta2$filter<-split[,3]
meta2$desc <- row.names(meta2)
meta2$Station <- as.character(meta2$Station)
meta2$Depth_f <- factor(meta2$Depth, levels=c('Surface','SCM','Mid','Bottom'))
meta2$Station_f <- factor(meta2$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
META <-sample_data(meta2)
sample_data(mergedps)<-META

Fixed!

str(sample_data(mergedps))
## 'data.frame':    112 obs. of  13 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 13
##   .. ..$ : chr  "10" "10" "10" "10" ...
##   .. ..$ : chr  "Bottom" "Bottom" "Mid" "Mid" ...
##   .. ..$ : num  1510 1510 700 700 85 85 0 0 1210 1210 ...
##   .. ..$ : chr  "10" "2" "10" "2" ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : num  3 3 3 3 3 3 3 3 2 2 ...
##   .. ..$ : num  4 4 4 4 4 4 4 4 4 4 ...
##   .. ..$ : num  2 2 2 2 2 2 2 2 1 1 ...
##   .. ..$ : chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   .. ..$ : Factor w/ 4 levels "Surface","SCM",..: 4 4 3 3 2 2 1 1 4 4 ...
##   .. ..$ : Factor w/ 14 levels "11","10","9",..: 2 2 2 2 2 2 2 2 1 1 ...
##   ..@ names    : chr  "Station" "Depth" "m" "filter" ...
##   ..@ row.names: chr  "10-Bottom-10" "10-Bottom-2" "10-Mid-10" "10-Mid-2" ...
##   ..@ .S3Class : chr "data.frame"

Taxa glom - for clean RA plots (so that there is a not a line in the plot for each ASV)

mergedps<- subset_taxa(mergedps, D2 != "Metazoa")
mergedps<-subset_taxa(mergedps, D1 != "")
mergedps<-subset_taxa(mergedps, D1 != "NA")

mergedps = tax_glom(mergedps, "D2")
'%ni%' <- Negate('%in%')
LowPrev<- c("Alveolata_X", "Amoebozoa_X", "Apicomplexa", "Apusomonadidae", "Conosa", "Discoba", "Eukaryota_XX", "Foraminifera", "Hilomonadea","Fungi", "Katablepharidophyta", "Lobosa", "Mesomycetozoa", "Metamonada", "Metazoa", "Opisthokonta_X", "Perkinsea", "Streptophyta", "Rhodophyta","Telonemia", "")
mergedHighPrev<- subset_taxa(mergedps, D2 %ni% LowPrev)
mergedRAhp <- transform_sample_counts(mergedHighPrev, function(OTU) 100* OTU/sum(OTU))

ap2<- c("#F7CDA4", "#DB5339", "#E3E5DB", "#A5CFCC", "#11638C", "#A83860","#D4A52A", "#CF529C", "#1E479A", "#F1B6A1", "#42465C",  "#0E899F")


taxabarplot<-plot_bar(mergedRAhp, x= "Station_f", fill = "D2", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values= ap2) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D2), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

taxaplot_nolegend <- taxabarplot + theme(legend.position="none")

taxaplot_nolegend

#ggsave("taxaplot_nolegend_newcolors.pdf", width = 8, height = 5)

taxabarplotD1<-plot_bar(mergedRAhp, x= "Station_f", fill = "D1", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=ap) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D1), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

taxaplotD1_nolegend <- taxabarplotD1 + theme(legend.position="none")

taxaplotD1_nolegend

BottomTaxa<-subset_samples(mergedRAhp, Depth == "Bottom")

Bottombarplot<-plot_bar(BottomTaxa, x= "Station_f", fill = "D2", facet_grid=~filter) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=ap2) + theme(legend.title=element_blank()) + geom_bar(aes(fill=D2), stat="identity", position="stack", width =0.9) +theme_classic() + theme(text = element_text(size=14))+theme(axis.text.x = element_text(angle = 90)) + xlab("Station") +ylab("Relative Abundance(%)")

Bottombarplot_nolegend <- Bottombarplot+ theme(legend.position="none")

Bottombarplot_nolegend

#ggsave("Bottombarplot_nolegend.pdf", width=8, height =4)

2.9 DESeq2 – Differential abundance testing

subset rawcount phyloseq object to just hold bottom samples:

bottomraw<- subset_samples(ps, Depth=="Bottom")

First, with only station 2 and 10 considered as hydrothermal sites:

ddse <- phyloseq_to_deseq2(bottomraw, ~VentANAHTM)
## converting counts to integer mode
ddse2 <- DESeq(ddse, test="Wald", fitType="parametric")
res<- results(ddse2,  cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
dim(sigtab)
## [1] 24  6

And if station 11 is included …

ddse <- phyloseq_to_deseq2(bottomraw, ~Vent)
## converting counts to integer mode
ddse2 <- DESeq(ddse, test="Wald", fitType="parametric")
res<- results(ddse2,  cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
dim(sigtab)
## [1] 46  6
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(bottomraw)[rownames(sigtab), ], "matrix"))
sigtab = sigtab[sigtab$D2 != "Fungi",] # this fungus classification seems to be a mistake (based on original publication) and would otherwise only be classified as Eukaryota and, therefore, filtered from our data set at the very beginning of the analysis
x = tapply(sigtab$log2FoldChange, sigtab$D2, function(x) max(x))
x = sort(x, TRUE)
sigtab$D2 = factor(as.character(sigtab$D2), levels=names(x))
#D3 level
x = tapply(sigtab$log2FoldChange, sigtab$D3, function(x) max(x))
x = sort(x, TRUE)
sigtab$D3 = factor(as.character(sigtab$D3), levels=names(x))
# D4 level
x = tapply(sigtab$log2FoldChange, sigtab$D4, function(x) max(x))
x = sort(x, TRUE)
sigtab$D4 = factor(as.character(sigtab$D4), levels=names(x))
#plot F0
sigplot<- ggplot(sigtab, aes(x=D2, y=log2FoldChange, color = D3)) + geom_point(size=5, alpha = 0.7) + theme(legend.title=element_blank()) + theme_bw() +
   ggtitle(" ") +
  theme(axis.text.x = element_text(angle = -45, hjust = 0, vjust=0.75)) + scale_color_manual(values=ap) + theme(text = element_text(size=16))
noLegendsigplot <- sigplot +theme(legend.position = "none")
noLegendsigplot

#ggsave("sigplotnoLegend.pdf", height = 4, width = 6)
sigplot

#ggsave("sigplot.pdf")